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In recent years, Bayesian methods have been proposed as a solution to a wide range of issues in 
quantum state and process tomography. State-of-the-art Bayesian tomography solutions suffer from 
three problems: numerical intractability, a lack of informative prior distributions, and an inability to 
track time-dependent processes. Here, we address all three problems. First, we use modern statistical 
methods, as pioneered by Huszar and Houlsby [1] and by Ferrie [2], to make Bayesian tomography 
numerically tractable. Our approach allows for practical computation of Bayesian point and region 
estimators for quantum states and channels. Second, we propose the first priors on quantum states 
and channels that allow for including useful experimental insight. Finally, we develop a method that 
allows tracking of time-dependent states and estimates the drift and diffusion processes affecting a 
state. We provide source code and animated visual examples for our methods. 


I. INTRODUCTION 

Quantum state and process tomography are impor¬ 
tant methods for diagnosing and characterizing imper¬ 
fections in small quantum systems. By fixing prob¬ 
lems in models and implemenfafions, and by having 
a well-characferized sysfem, we may hope fo compose 
mulfiple sysfems fo build reliable larger quanfum sys- 
fems. These larger sysfems require a scalable approach 
fo characferizafion, such as using fhe mafrix-producf 
sfafe ansafz [3] or informafion localify [4, 5]. Quan¬ 
fum tomography has seen many improvements since 
its inception [6]. In particular, tomography has enjoyed 
advances in providing maximum-likelihood estimators 
[7], region estimators [8-10], model selection [11, 12], 
hedging [13], and compressed sensing [14,15]. 

These techniques, though powerful, do nof fake ad- 
vanfage of prior informafion available fo experimen- 
falisfs. Such prior informafion can include knowledge 
gained in building fhe experimenf, or in performing 
similar experimenfs, as well as knowledge gained from 
fhe calibrafion leading up fo an experiment of inferesf. 
A class of fechniques fhaf allow one fo include prior in¬ 
formafion is called Bayesian esfimafion. 

Bayesian fechniques in fhe confexf of quanfum to¬ 
mography were firsf suggested by Jones [16], Slater [17], 
Derkaeffl/. [18], Buzekefa/. [19], and Schackefa/. [20]. In 
addition to the inclusion of prior informafion, Bayesian 
esfimafion also naturally includes several other exper¬ 
imental advantages, such as optimality [21-23], adap- 
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live experimental design [1, 24], robust region estimates 
[25] and model selection criteria [2, 26]. These advan¬ 
tages arise from fhe facf fhaf Bayesian mefhods provide 
a complete characferizafion of fhe currenf sfafe of an ex- 
perimenfalisf's knowledge affer each dafum. 

Given fhe many proposals for and advanfages of 
Bayesian mefhods, the lack of adoption of Bayesian 
methods in experimental tomography, with the excep¬ 
tion of some recent work [24,27], seems to be primarily 
a practical problem. Bayesian methods are rarely ana¬ 
lytically tractable. Further numerical implementations 
of Bayesian methods (for inference of many variables) 
are a small scale software engineering project and com¬ 
putationally expensive. Bayesian reasoning, in the clas¬ 
sical statistics literature, is typically realized with effi¬ 
cient well-known classical algorithms which go by the 
names of particle filtering [28] and sequential Monte 
Carlo [29]. Even though these algorithms can be numer¬ 
ically efficient, for multiple variables they are non-trivial 
to implement and optimize. In the context of quantum 
state tomography sequential Monte Carlo has been ap¬ 
plied to adaptive tomography [1] and model selection 
[2], generalizing and dramatically simplifying earlier ef¬ 
forts based on Kalman filtering [30]. However, much of 
the code developed for this application is either special¬ 
ized to particular cases, or has not been released to or 
adopted by the community. Releasing reusable code is 
critical not only for practicality in experiments, but also 
for producing reproducible research results [31]. 

^Another difference between the application of 
Bayesian methods in classical statistics and its appli¬ 
cation to quantum state and processes tomography is 
the lack of choice in priors. Experimentalists spend 
many hours designing, testing and calibrating their 
platforms. It would he nice to include the prior in- 
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formation from this lengthy process into a quantum 
estimation procedure. Classically one can choose from 
many different priors, which could be "informative" 
or "uninformative," depending on the domain of the 
probability distribution. In the quantum setting, many 
canonical priors are unitarily invariant and have some 
uninformative distribution over purity [32]. The lack 
of alternatives can be explained both by the lack of 
theoretical knowledge on how to construct such priors 
and more importantly there is no general purpose 
software available that could make use of such priors 
once constructed. 

Finally, there is a critical issue facing the tomogra¬ 
phy community, independent of the respective merits of 
frequentist or Bayesian approaches. It is the fact that 
the output of realistic quantum sources vary in time, 
both deterministically ("drift") and stochastically ("dif¬ 
fusion"). Incorporating time-dependence in quantum 
characterization protocols has been a topic of significant 
recent interest [12, 33-37]. For the most part, model se¬ 
lection has been used in tomographic experiments to 
detect drift or diffusion. While useful, this does not 
yield a protocol for incorporating that drift (or diffusion) 
into the estimation protocol once it has been detected. 
Moreover, many of the current solutions are not general 
purpose, nor are they practical in a range of important 
applications. For instance, see the proposal of Blume- 
Kohout et al. [38], which requires quantum memory on 
the order of the number of samples collected so that the 
DeFinetti conditions are met. This is clearly impractical, 
and further precludes adaptivity. 

In this work, we address all three issues. The unifying 
theme is the use of the sequential Monte Carlo (SMC) 
algorithm to perform tomography. 

To address the issue of adoption we implement SMC- 
based tomography using an open-source library for 
Python [39] that integrates with the widely-used QuTiP 
library for quantum information [40], and can be used 
with modem instrument control software [41]. The tech¬ 
niques that we introduce in this paper can readily be ap¬ 
plied in experimental practice—we provide a tutorial on 
software implementations in Appendix F. 

Second, we give a constructive method for defining 
priors that represent initial estimates informed by ex¬ 
perimental insight. We use sequential Monte Carlo to 
avoid the need to write an analytic form for our prior. 
This is especially useful in the context of quantum state 
and process tomography, where analytic expressions are 
available for only a few distributions. Instead, SMC only 
requires that we provide a means to sample from priors. 
To do so, we first draw sample states or charmels from 
a reference or "flat" prior, then transform each sample 
by a quantum operation drawn from an ensemble. The 
only input required to define this ensemble is a prior es¬ 
timate of the state. Our method then guarantees that this 
becomes the mean of our new prior. 

Third, we incorporate tracking of drifting and dif¬ 
fusing evolution into state tomography by using tech¬ 


niques from computer vision which perturb a sample. 
Each such perturbation is drawn from a Gaussian whose 
mean and variance represent deterministic and stochas¬ 
tic evolution respectively. 

This article is structured as follows. We begin by 
reviewing Bayesian tomography in Section II. Then 
we describe prior work on priors for quantum states 
and channels in Section III. In Section IV we trans¬ 
form these priors into new priors for states and chan¬ 
nels that include experimental insight. Using Bayesian 
tomographic methods, we then describe how to track 
quantum states and channels as a function of time in 
Section V. In Section VI we illustrate these ideas and 
more, then conclude in Section VII. 

11. BAYESIAN TOMOGRAPHY 

To infer the state of a quantum system of dimension 
D, we perform a set of measurements on N identically 
and independently (iid) prepared quantum systems. As 
is usually the case, we will restrict to measuring each 
system separately. In general, one could consider per¬ 
forming a different generalized measurement on each 
system, and the kind of generalized measurement could 
adaptively depend on all prior measurements. This can 
be included in the expressions below at the cost of addi¬ 
tional notational baggage. 

Consider a single positive operator valued measure 
(POVM) {£*:} whose K elements represent the outcomes 
of measurements. If we perform this generalized mea¬ 
surement on all N systems it results in a string of mea¬ 
surement results At = {Mi,M 2 , ...,Mjv}, where M,- is 
the POVM element obtained in the fth trial. Let be 
the number times that Ejt is observed in At, i.e. fre¬ 
quency of Ej-. The statistical information measurement 
outcomes have about the preparation is described by 
a likelihood function; that is, a probability distribution 
over measurement records, conditioned on a hypothe¬ 
sis p about the state. In particular, by using Born's rule 
Pr(£j-|p) = Tr(Efc|0) to write out the likelihood, we ob¬ 
tain that 

N 

C{p)=Fv{M\p)=YlTv[M,p] (la) 

i=l 

= Tr [Eip]”i Tr [£ 2 ^]"^ • • ■ Tr [E^p]"^ (lb) 

Moreover, by using the Choi-Jamiotkowski isomor¬ 
phism, we can associate a state }{A)/D with each quan¬ 
tum charmel A. As we detail in Appendix A, prepa¬ 
ration and measurement in a process tomography ex¬ 
periment can be written as a measurement of the state 
/(A) /D, such that (la) also includes this case. 

In general, a likelihood function completely models 
an experiment by specifying the probability of observ¬ 
ing any measurement, conditioned on the h 5 rpothesis 
we would like to learn. Thus, the likelihood function 
serves as the basis for subsequent estimation. 
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For instance, in maximum likelihood estimation 
(MLE), an estimate p of the state p is formed by 


inner producf ((A|B)) = Tr[A'*^B]. In fhe D-dimensional 
Hilberf space, a sfafe mafrix can be represenfed as 


Pmle := argmax/:(p). (2) 

P 


Here, however, we use fhe likelihood funcfion fo insfead 
perform Bayesian inference as described by Jones [16], 
Slafer [17], Derka et al. [18], Buzek et al.[19], and Schack 
et al. [20]. Below we follow Blume-Kohouf's [22] presen- 
fafion closely. We begin by using Bayes' rule. 


Vr{p\M) dp 


Pr(Al|p) Pr(p) dp 
Pr(M) 


( 3 ) 


where n{p) := Pr(p)dp is called fhe prior disfribufion, 
and 

A/' = Pr(Al) = y dpPr(Ad|p) Pr(p) (4) 

is a normalization consfant. We shall wrife p ~ tt fo 
indicafe fhat fhe random variable p is drawn from fhe 
prior n. 

In fhe nexf fwo Secfions, we will refum fo fhe quesfion 
of how fo choose fhe prior disfribufion Pr(p)dp. Hence- 
forfh, we drop fhe measure dp unless we are infegraf- 
ing a disfribufion, as we will lafer use a numerical algo- 
rifhm which approximafes fhis confinuous disfribufion 
by a discrefe disfribufion. 

The Bayesian mean esfimafe (BME) is fhen given by 
fhe expecfafion 

|6bme(X) := Ep[p|Al] = JdppFr{p\M), (5) 

where E indicafes an expecfafion value, and where 
conditional bars and fhe subscripf denofe fhe disfribu- 
tion fhe expecfafion is faken over e.g. 'Ex\f{x)\y\ = 
Ex/(^) Pr(x|i/) denofes fhe conditional expecfafion of 
f{x) given y. 

The Bayesian mean estimator is an opfimal esfima- 
for for any sfricfly proper scoring rule on sfafes [21]. 
These scoring rules arise from Bregman divergences [42] 
such as fhe Kullback-Leibler divergence or fhe quadratic 
toss Lq(p,p) := Tr [(p - p)Q[p - p]], where Q is a posi- 
five semidefinife superoperafor [43]. As we will discuss 
in more defail below, fhe error incurred by fhe BME is 
well-characferized by spread of samples from fhe posfe- 
rior disfribufion. Imporfanfly, if one uses infidelify as a 
toss funcfion, fhe BME remains approximafely opfimal, 
even though the infidelity is not a Bregman divergence 
rule [23]. 

To make the problem of estimating states and chan¬ 
nels more concrete, it is helpful fo specify a real-valued 
parameterization of fhe tomographic model. We sfarf 
by considering fhe space of linear operafors acting on a 
D-dimensional Hilberf space. We represenf an operafor 
wifh an "operafor kef" |A)) corresponding fo A, while 
the dual vector is the corresponding "bra" ((A| and rep¬ 
resents . This vector space has the Hilbert-Schmidt 


P = (6) 

«=1 

where x^ = ((Ba|p)) for a basis of Hermifian operafors 
{Ba} fhaf is orfhonormal under the Hilbert-Schmidt in¬ 
ner product ((Ba|B^)) = Tr[B^B^] = Eor simplicity, 

we choose Bq = 1 / a/D to be the only traceful element. 
The corresponding operator ket representation of p is 


||0)) = 

/ xo \ 

Xi 

X2 


/ ((Bo If)) \ 
((Blip)) 
((B2IF)) 


( I/VD \ 
Tr[Bip] 
Tr[B2p] 


\ Xq2_i / 


[{{Bd^-i\p))) 


1 Tr[BD2_ip] j 

(7) 


As a consequence of p being considered as a random 
variable, each parameter X; is also a random variable. 
Thai is, we have represenfed p in ferms of a vector- 
valued random variable |p)); because we have chosen 
a Hermifian basis, |p)) is also a real vecfor. Moreover, 
each paramefer x, is fhen by definifion equal fo fhe 
mean value (B;) of fhe observable B„ faken over possible 
measuremenf oufcomes. In general, tensor producfs of 
Pauli mafrices and generalized Gell-Mann mafrices can 
be used as such a Hermifian basis for mulfiple finife- 
dimensional quanfum sysfems. 

Having chosen a parameferizafion in ferms of observ¬ 
ables, we can now reason abouf the error in parameters 
of p. Suppose fhaf for a given posterior, p is normally- 
disfribufed abouf fhe esfimafed sfafe p, fhen fhe disfri¬ 
bufion over p is fully described by ifs mean, i.e. (5), and 
covariance crip between x, and Xj (fhe i and j componenfs 
of p), 

(Tpj = CoVy ( Ip)) ) = Ep [(x,- - Ep [x;]) {Xj - Ep [Xj])]. (8) 
The covariance mafrix for fhe posferior is fhen 

Ep = Ep[|p - E[p])) ((p - E[p]|] = Cov(lp))) 

/O 0 0 ... 0 \ 

0 O'!,! D'U ••• CTpDZ-l 

= 0 cri2 0 - 2^2 ^2,D2-1 . 

0 : : 

V ^ ^1,D2-1 ^2,D2-1 • • ■ <^d2-1,d2-1 / 

Because we have wriffen fhe covariance mafrix in fhe 
basis of Ip)), we can apply Ep as a superoperafor on lin¬ 
ear operafors. The action of Ep on an observable X fhen 
gives fhe variance of fhe value faken on by measure- 
menfs of X in ferms of fhe law of fofal variance as 

V[X] ^ Vp[(X)p] + {x2)e[p] - (X)I^] 

= ((X|Ep|X)) + ((X|X|p))-(X)|jp], 
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where Vp is the variance over the state p and where {■)p 
is the expectation value over measurement outcomes, 
conditioned on the state p. Note that although we have 
used a coordinate form to arrive at the above expression, 
it is basis independent. 

As discussed in detail by Blume-Kohout [22], the co- 
variance matrix Ap describes a credible region (ellip¬ 
soid) up to a scaling parameter Z corresponding to the 
level of fhe region [44]. Specifically, fhe eigenvecfors 
and eigenvalues of Ap are fhe principle axes and lengfhs 
of fhe axes respecfively. Minimizing an appropriafe 
norm of Ep fhus provides a nafural objective funcfion 
for adapfively designing fomographic experimenfs, as 
we will discuss furfher in Secfion VI. 

Refurning fo fhe problem of finding posferiors, we 
nofe fhaf in practice, fhe infegral in (5) is rarely analyf- 
ically fracfable. Thus, Blume-Kohouf suggesfed an ap¬ 
proximation such as fhe Mefropolis-Hasfings algorifhm 
could be used insfead [22]. Rejection sampling mefhods 
such as Mefropolis-Hasfings fend fo be prohibifively ex¬ 
pensive, however, and suffer from vanishingly small 
accepfance probabilities as dafa is collecfed. Though 
fhere have been recenf advances in rejection sampling 
for Bayesian inference [45], fhe assumption of a normal 
posferior is difficulf fo use in fhe confexf of quanfum 
tomography. Consequently, we instead follow the ap¬ 
proach of Huszar and Houlsby [1], and later Ferrie [25], 
and use the sequential Monte Carlo (SMC) algorithm 
[29], which does not rely on the assumption of a nor¬ 
mal posterior. A brief review of SMC can be found in 
Appendix B and the references therein. 

SMC offers the advantage that we need not explicitly 
write down a prior, but instead treat Bayes' rule as a 
transition kernel that transforms prior h 5 rpotheses called 
particles [46] into samples from the posterior. These par¬ 
ticles are then used to approximate the integral (5). For 
tomography, each particle represents a particular hy¬ 
pothesis about the state p, so that the prior Pr(p)dp can 
be written under the SMC approximation as [1] 

Pr(p)Ri ^ WpS{p-pp), (11) 

;7Gparticles 

where zup reflects the relative plausibility of the corre¬ 
sponding state conditioned on all available evidence. 
Initially, Wp is taken to be uniform, as the density of sam¬ 
ples carries information about prior plausibility. After 
updating the particles based on experimental observa¬ 
tions, we can readily calculate the Bayesian mean es¬ 
timator and posterior covariance matrix by summing 
over the particle approximation. 

Before moving on, we wish to point out that SMC 
also allows for more sophisticated credible region 
estimators— we focus here on the covariance region es¬ 
timator for simplicity. In particular, any set of particles 
Pcc such that iVp > 1 — a forms an a-credible re¬ 

gion. Taking a convex hull over such a region then pro¬ 
vides a region which naturally includes the convexity of 
state space, and a minimum-volume enclosing ellipsoid 


over a credible region yields a compact description [25]. 
Both of these credible region estimators are included in 
the open-source package that we rely on for numerical 
implementations, QInfer [39]. Thus, we inherit a vari¬ 
ety of practical data-driven credible region estimators. 
For the remainder of this work, we choose to use to use 
naive "3(r" covariance ellipsoids for the purpose of il¬ 
lustration, by which we mean that we take the ellipsoid 
defined by the covariance matrix and scale it by a factor 
of Z = 3. 


III. DEFAULT PRIORS: THE SAMPLING OF STATES 
AND CHANNELS 

In sequential Monte Carlo, we need to be able to 
draw samples from a prior, see (11). In this Section we 
briefly review how to draw samples from several well- 
established priors [16]. Loosely speaking, these priors 
are useful as they define a notion of uniformity over 
states and charmels, and do not posit any prior esti¬ 
mate other than the maximally mixed state. Following 
the advice of Wasserman [47], we will term these well- 
established priors as default priors, as each of them is a 
reasonable choice to adopt as a prior in lieu of more de¬ 
tailed information. 

Later, in Section IV, we will take priors as algorithmic 
inputs that define what states are feasible for a given to¬ 
mographic experiment. We will refer to priors that can 
be used in this way as fiducial. From the perspective 
of the assumptions our algorithm makes, we require in¬ 
puts to have the property that 

^P~7z[p] = J p n{p) = J pFr{p)dp = (12) 

(fiducial prior) 

that is, that the mean of the prior is the maximally mixed 
state. All of the default priors described in this section 
are fiducial in the sense given in (12). Similarly, we say 
that a prior is insightful if its mean is anything other than 
a maximally-mixed state. In Section IV, we consider pri¬ 
ors that are insightful by our definition, that is 

Ep-'/rM =J pn{p) = j pPr(p)dp = pp, (13) 

(insightful prior) 
where pp 1/D. 

In constructing default priors, we will make repeated 
use of random complex-valued matrices with entries 
sampled from normal distributions. Such matrices 
form the Ginibre matrix ensemble [48]. We provide 
pseudocode for all default sampling algorithms in Ap¬ 
pendix C. For brevity, we refer to algorithms by the 
initials of their authors; for instance, we refer to Algo¬ 
rithm 1 as the ZS algorithm after Zyczkowski and Som¬ 
mers [49]. 
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A. Priors on States 

For pure quantum states, the canonical default prior 
is the Haar measure. One can easily sample states from 
fhis measure by sampling a vecfor in wifh Gaussian- 
disfribufed enfries, fhen renormalizing. Alfernafively, 
one can sample unifary mafrices uniformly according 
fo fhe Haar measure as defailed in fhe Mezzardi algo- 
rifhm [50], see Algorithm 2. A random pure state is 
then a Haar-random unitary applied to a fiducial sfafe. 
Nofe fhaf fhe Haar measure is fiducial; fhaf is, if makes a 
predicfion of fhe maximally-mixed sfafe, in fhe sense of 
( 12 ). 

Generalizing fo mixed sfafes, we consider two well- 
known ensembles of random sfafes. Firsf, we consider 
sfafes drawn from fhe Ginibre ensemble, a generaliza- 
fion of fhe Fhlberf-Schmidf ensemble fhaf allows for re- 
sfricfions on rank. Second, we consider fhe ensemble of 
sfafes drawn from fhe Bures measure. 

Samples from bofh ensembles are neafly capfured by 
a single equation 


As defailed by Bruzda et al. [52], fo generafe a channel 
A : ^ C^, fhe BGSZ algorithm (see Algorithm 5) 

begins by selecting a Ginibre-random density operator 
p of dimension and a fixed rank K. For nofafional 
simplicify, we lef p be an operafor acfing on fhe biparfife 
Hilbert space (8 C^. The trace-preserving condition 
is then imposed by letting Y = Tr 2 p be the partial trace 
over the second copy of C^, fhen fransforming p info 
fhe Ghoi sfafe of fhe sampled channel Aga^ple t>y 

/(Asample)/D = ^ ^ (^ 5 ) 

If is easy fo verify fhaf charmels Agampie sampled in fhis 
way are indeed frace-preserving and complefely posi¬ 
tive. Moreover, the transformation above preserves the 
property that ]E[/(Asajnpie)/D] = 1/D, such that the 
mean of fhe BGSZ disfribufion is fhe complefely depo¬ 
larizing channel. This condition on charmel priors is 
precisely that given as (12). We will show below that 
the BGSZ distribution suffices fo consfrucf a prior over 
charmels fhaf is insightful. 


{t + U)AA\t + Ufi 

Fsample = , .1,^ , . (14 IV. INSIGHTFUL PRIORS FOR STATES AND 

Tv[{t + U)AA^t + W)] CHANNELS 


where U is eifher fhe idenfify or a Haar-random unifary, 
and A is a D X X Ginibre mafrix. If U is faken fo be fhe 
identify, fhen fhe sfafe is drawn from fhe Ginibre ensem¬ 
ble with rank K and is unitarily invariant. This means 
the prior will only have support on states with rank less 
than or equal to K. If A is faken fo be a D x D Ginibre 
mafrix and U is Haar-random, fhen fhe sfafe is drawn 
from fhe Bures measure. Thus, Psample ~ Ginibre(D,X) 
or Psample ^ Bures(D), respecfively. These samples can 
fhen serve as an fiducial prior for SMG, in fhe sense de¬ 
scribed by equations (11) and (12), or can be fransformed 
info samples from a prior fhat is insightful, as described 
in Secfion IV. 

These procedures are given by Algorifhm 3 and Algo- 
rifhm 4 respecfively [51]. 


B. Priors on Channels 

In developing applicafions fo quanfum process to¬ 
mography (QPT), we use fhe facf fhaf learning fhe Ghoi 
sfafes of unknown channels is a special case of sfafe 
fomography, as derived in Appendix A. Thus, if is 
also useful fo consider prior disfribufions over fhe Ghoi 
sfafes of complefely positive frace preserving (GPTP) 
quantum maps. In particular, for process fomography, 
we will use fhe measure derived by BGSZ [52] fo draw 
samples from a prior over quanfum channels fhaf is 
fiducial. The resulting algorithm is unitarily invariant 
and supported over all charmels of a given Kraus rank; 
fhaf is the minimal number of Kraus operators required 
to specify the channel. 


Our basic technique is to transform the samples 
drawn from fiducial priors, in particular the default pri¬ 
ors described in Section III, to insightful priors by ap¬ 
plying a channel O to the fiducial prior 

Psample Psample )■ (1^) 

insightful fiducial 

The algebraic gymnastics below simply determine how 
to construct insightful priors with a given mean p^. 

Here, we seek to use the default priors from Section III 
to construct a prior n{p) over states with that has a de¬ 
sired mean = 'Ep^rcip], and introduces little other in¬ 
formation. The choice of the mean could be informed 
by, for example, experimental design or previous exper¬ 
imental estimates. Gritically, a prior n is not uniquely 
specified by the first moment of p over n, pp = Ep.^ 7 r[p]. 
Rather, the mean state pp only is a complete specification 
of observables measured against a state drawn from the 
prior. Indeed, different sets of assumptions can result in 
the same mean state, as illustrated in Figure 1. Thus, ad¬ 
ditional constraints are required to select an appropriate 
prior. 

We also require that n has support over all feasible 
states; for instance, those states of the appropriate di¬ 
mension [10], possibly subject to rank restrictions. By 
making this demand, our tomography procedure can 
recover from bad priors, given sufficient data; we will 
show the robustness of our algorithm in later in this sec¬ 
tion as well as in Section VI. Finally, we demand that 
our insightful priors can be sampled efficiently with the 
dimension of the state under consideration. 
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Pure state prior 


Mixed state prior 




FIG. 1. Two priors, one mean state. The prior on the left as¬ 
sumes support only on pure states while the one on the right 
includes support on mixed preparations. We choose to illus¬ 
trate our manuscript with rebits for visual clarity. 


A. Construction of Insightful Priors 


To achieve the desiderata that all feasible states are 
supported, that we can sample efficiently, and that the 
prior mean is p^, we proceed in two steps. First, we sam¬ 
ple pf from an fiducial prior (p{p) i.e. pf ^ (p. The sam¬ 
ple from fhe fiducial prior is fhen fransformed fo a sam¬ 
ple from fhe insighfful prior under a generalized ampli- 
fude damping channel (GAD) 

Fsampie = ^{Pf\e,p*) = {1 - e)p f + ep^lv\p f], (17) 


where e G [0,1] is fhe damping paramefer and p* is fhe 
fixed poinf of fhe map. For e = 0, fhe map is fhe idenfify 
charmel, while for e = 1, fhe map damps fo fhe mixed 
sfafe p*. In our mefhod e is nof a fixed number buf is 
drawn from an ensemble described by fhe befa disfribu- 
fion, i.e. e ^ Befa(a,/3). 

Thus fo defermine fhe mean of n{p) we musf defer- 
mine p^ given p ^ (p and e ~ Befa(a, j6), fhaf is 


Pf :=Ep,e[‘I’(FlGP*)] 


= E, 


{l-e)-+ep. 


fi t cc 

cc + pD ^ a + 


(18) 


where on fhe second line we have used fhe facf fhaf 
lEp.^ 7 r[p] = 1/D for priors fhaf are fiducial, and on fhe 
fhird line we have used = a/(a + fi). Inverf- 

ing fhis relafionship fells one how fo choose fhe fixed 
poinf of fhe channel (17) fo obfain a given mean 





oi + f/D J 


(19) 


Clearly we need more fhan fhe firsf momenf p^ fo spec¬ 
ify fhe prior; we musf defermine a and fi fo complefe fhe 
specificafion. The firsf consfrainf on a and fi comes from 
fhe posifivify of p*, which is a valid sfafe only if 


0C + 15 



0C + ) 


( 20 ) 


where A, are fhe eigenvalues of p^. Thus, fhe minimum 
eigenvalue Amin of p^ parfially consfrains a and by 
^min > + /^)]- lo order fo complefely defermine 

fhe paramefers of fhe befa disfribufion, we adopf fhe 
principle fhaf fhe acfion of fhe channel (17) should be 
minimized. We use fhis principle as an efficienf heuris- 
fic mofivafed by analogy wifh maximum enfropy mefh- 
ods. In ofher words, the insightful prior is as uninformative 
as possible given the constraint of the chosen mean, and with 
respect to a particular default prior. 

We fherefore choose fo minimize fhe expecfed value 
Ee~Beta[^] = oc/{oi + f), such fhaf TT is fhe closesf GAD- 
fransformed disfribufion fo (p wifh fhe given mean p^. 
This minimizafion gives 


a = 1 and f 


DAmin 
1 - DAmin' 


( 21 ) 


This consfrucfion nafurally specializes fo provide a 
procedure for esfimafing fhe bias of a coin, as discussed 
in Appendix D. 


B. Convexity and Robustness of Insightful Priors 

Note that, because e = 0 is in the support of all beta 
distributions, our prior ensures that its support is at 
least that of the given fiducial prior, supp n D supp f. 
For the default priors listed in Section III, the prior con¬ 
structed by our procedure has support over all states of 
the appropriate dimension. In general, the fiducial prior 
defines the states that we consider to be valid, as can be 
seen from the convexity of our construction. 

That is, if Pfi is a convex combination over states in the 
support of the fiducial prior, then supp n is the convex 
closure of supp f. On the other hand, if p^ lies outside 
of the support of the fiducial prior, then our algorithm 
chooses p* to lie outside as well, such that the support 
of the insightful prior is bigger than that of the convex 
closure of the fiducial support. 

As our procedure preserves the support of the fiducial 
prior in both cases, our procedure also defines insightful 
priors for rebits and charmels by using the real Ginibre 
and BCSZ priors as fiducial priors, respectively. In par¬ 
ticular, using the BGSZ prior as the fiducial prior for a 
GAD-transformed distribution, we then obtain a prior 
that is insightful and is supported over all completely- 
positive and trace-preserving maps of a given dimen¬ 
sion. Together with the Ghoi-Jamiolkowski isomor¬ 
phism described in Appendix A, we can apply SMG to 
process tomography with little further effort. 

More exotic fiducial priors, such as distributions over 
stabilizer states or mundane states in the sense de¬ 
scribed by Veitch et al. (zero-mana) [53], can also be 
used, provided that they can be efficiently sampled and 
have the maximally-mixed state as their mean. For ex¬ 
ample, a prior that is insightful for mixed rebits is given 
in Figure 2. 


a 


> 0 Vi, 
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FIG. 2. (Left) A default prior (rank-2 Ginibre ensemble) for a 
single rebit. This fiducial prior is transformed into a prior that 
is insightful (right) by first choosing a mean, in this case Pfi = 
2(1 + |i7z -|- ^dx). Next, the generalized amplitude damping 
channel consistent with this mean state, as in (17), is applied 
to the samples. 


Before proceeding, however, we note that our notion 
of a fiducial prior does not imply that such priors are 
uninformative — indeed, we have seen that they serve to 
define what states are considered valid at all. Indeed, as 
Wasserman states [47], "by definition, a prior represents in¬ 
formation. So it should come as no surprise that a prior cannot 
represent lack of information." As a further example, con¬ 
sider a prior over states of a given purity r; for a qubit, 
such states form a shell inside of the Bloch sphere. This 
prior conveys conveys information about what states 
are considered feasible at all, but still reports as its initial 
estimate that the state of interest is maximally mixed, i.e. 
it is fiducial, and can be used as input to our algorithm. 
Indeed, were one to do so, our algorithm would use this 
specification of what a feasible state is to define an in¬ 
sightful prior that is limited to a convex hull of the ball 
of states of purity no greater than r and the desired mean 
state (provided Tr(p^) < r, such that lies within the 
given purity ball). Taking the case as r ^ 1/D (that is, a 
(^-function prior supported only at the maximally-mixed 
state), the situation becomes more extreme, in that the 
insightful prior is then supported only on the line con¬ 
necting the maximally-mixed state to p*. For this reason, 
the default priors given in Section III are chosen to have 
support over all states of a given dimension and rank, 
making them especially useful inputs to our algorithm. 

Finally, it is vital that the prior we have suggested is 
robust. In Figure 3 we choose the mean of our insightful 
prior to be almost orthogonal to the true state. After 300 
random Pauli measurements, the posterior has support 
on the true state and the mean of the posterior is ap¬ 
proximately the true state. Thus, even if the mean of the 
prior is woefully wrong our procedure is robust in that 
it provides a reliable estimate. This robustness to a bad 
initial prior requires additional data to be collected, such 


FIG. 3. A demonstration of the robustness of our insightful 
prior. (Left) A prior that is insightful in the mean p^ = 2(1“ 
jqCTx). The true state, ptme = 2(1 + almost orthogo¬ 

nal to the mean of the prior and has purity Tr[pjj.^g] = 0.905. 
(Right) After 30 random Pauli measurements (10 shots each) 
the posterior mean is centered on the true state. A covari¬ 
ance ellipse at three standard deviations is drawn in as well, 
indicating the normal approximation to a 99% credible region. 
Notice that the ellipse slightly leaks out of the state space; the 
SMC approximation is rich enough to avoid this problem. An 
animated version of this figure is available online [54]. 

that useful prior information can accelerate experiments 
but will not, in general, lead to wrong conclusions. We 
explore this robustness further in Section VI. 


V. TOMOGRAPHIC STATE TRACKING 

In this section we use a generalization of particle filter¬ 
ing methods to track a stochastic processes. In the con¬ 
text of quantum state tomography, state-space methods al¬ 
low us to characterize a stochastically-evolving source 
without having to ignore all previous data. We call the 
resulting method tomographic state tracking. 

In particular we use the CONDENSATION algorithm, 
which interlaces Bayes updates with updates to move 
sequential Monte Carlo particles (using drift and diffus¬ 
ing of the particles), to follow a stochastic process [55]. 
This technique has since been applied in a variety of 
other classical contexts [29, 56] as well as in quantum 
information [37]. Such methods, collectively known as 
state-space particle filtering, are useful for following the 
evolution of a stochastic process observed through a 
noisy measurement. 

We now briefly explain the CONDENSATION algo¬ 
rithm, readers interested in further details are directed 
to the original paper [55]. Consider Figure 4 which il¬ 
lustrates the CONDENSATION algorithm for tracking a 
coin with a d 5 mamical bias Pr(Heads) = p{t). The cur¬ 
rent posterior of the coin bias, i.e. Figure 4(a), is given 
an SMC representation in Figure 4(b). In Figure 4(c), the 
true probability mass function changes— at this point. 
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The true probability mass function 



Particle filter representation of probability mass function 


(b) 


0 


• • 


• ••• • 



Dynamical change of true probability mass function 

(C) 

0 1 


Drift and diffusion of i'th particle according to N(;u(i), (T^(i)) 





A Bayesian update of particle weights is necessary when 
a new measurement result is obtained. 



FIG. 4. Illustration of state-space tracking for particle filters 
on a coin state via the CONDENSATION algorithm [55]. From 
top to bottom, (a) we start with a continuous posterior over 
state space, (b) then discretize by sampling to obtain our initial 
SMC / particle filter approximation, (c) a dynamical change of 
the true distribution occurs, (d) we then perturb each particle 
by a Gaussian and (e) truncate to valid probabilities to com¬ 
plete the diffusion step, and (f) perform a Bayes update on the 
next datum. The final posterior approximation then forms the 
new approximation (b) for the next diffusive update. As ex¬ 
plained in the main text, the Bayes update actually updates 
the joint distribution of the parameters to be estimated and 
the drift and diffusion parameters. This means the drift and 
diffusion "co-evolve" with the posterior, which is at the heart 
of the CONDENSATION algorithm. 


we step through the CONDENSATION algorithm to track 
this change. Each SMC particle is perturbed by a Gaus¬ 
sian. In particular, the ;th particle is perturbed by a 
Gaussian with mean }i{i) and variance 

In keeping with the terminology used by Isard and 
Blake [55], the mean of the perturbation is termed the 
drift, and allows one to track deterministic evolution 
of a probabilify mass funcfion; fhis becomes evident if 
cr^{i) = 0 for all i. Similarly, the variance of fhe perfur- 
bation is formed diffusion and will allow the algorithm 
to track a stochastic process. As we will detail further 
below, both the drift and diffusion paramefers can be 
learned online, such fhaf we do nof require fhem fo be 


known a priori. 

Somefimes fhe perturbafion by fhe Gaussian will 
cause fhe parficles fo fall oufside of fhe sfafe space, in 
fhis case fhe unit interval, see e.g. Figure 4(d). In this 
situation, we modify fhe CONDENSATION algorifhm fo 
fruncafe parficles fo be valid probabilifies, complefing 
fhe Gaussian perfurbafion sfep, see Figure 4(e). Finally, 
we obfain fhe nexf dafum and perform a Bayes updafe 
on fhe nexf dafum. The final posferior approximafion. 
Figure 4(f), fhen forms fhe new approximation (b) for 
the next diffusive updafe. 

Inferesfingly fhe CONDENSATION algorifhm sfarfs 
wifh a joinf disfribufion over fhe paramefers fo be es- 
timafed. For fhe coin case, fhese are fhe bias p{t), 
and fhe disfribution paramefers for driff and diffusion 
N{}i{i),cr^{i)). Thus, when fhe Bayes updafe occurs fhe 
driff and diffusion paramefers are updafed as well, even 
fhough fhe likelihood does nof explicifly depend on 
fhese paramefers, which is referred fo as co-evolufion. 
If is fhis co-evolufion fhaf powers fhe fracking capabili- 
fies of fhe CONDENSATION algorifhm. 

As fhe learning of deferminisfic evolufion of sfafes 
is well-undersfood [44, 57, 58], we will suppose fhat 
fhe sfate under sfudy is wifh respecf fo a frame fhat 
has already been well-characterized. Thus, the domi¬ 
nant remaining d 5 mamics of fhe sfafe under sfudy are 
sfochasfic, such fhaf we need nof consider driff updafes 
in our sfafe-space model. Even wifh fhis assumpfion our 
model can sfill frack deferminisfic evolufion, however, 
provided fhaf fhe diffusion is sfrong enough fo include 
fhe frue evolufion wifh high probabilify (see Figure 5 for 
an example of tracking deterministic evolution with dif¬ 
fusion alone). 

Goncrefely, we updafe parficle i by Pif^) i—t Pifk+i) 
by firsf adding driff and diffusion ferms fo find a sfep 
Apifi^) fo fake in sfafe-space, fhen fruncafing fhe neg- 
afive eigenvalues of (0,(4) + Apftf). For each particle 
Pi, we let Apifif) = Ap At], where Ap = Apff) is a 
deterministic drift, and where each traceless component 
of At] is drawn from a Gaussian N(0, c^) wifh sfandard 
deviafion a. As sfafed above, we work in a frame where 
fhe deferminisfic parf has been faken ouf so fhaf Ap = 0 
for all parficles and for all fime. The diffusion sfandard 
deviafion a is fhen faken fo be a funcfion of evolufion 
fime and fhe new model paramefer t], a = \/tk+i — hp. 
This allows fhe evolufion rafe fo "co-evolve" wifh fhe 
sfafe model p, as described above. 

Diffusion is complefed by finding fhe specfral de- 
composifion of Pi(tf) + Apiffj, fhen fruncafing and 
renormalizing. In parficular, lef Piff) + Ap,(fj-) = 
Ljhj \fi,j) Then, 


Pi{fk+l) 


1 I Ty (*/’/,;■ I if ^i,j — 0 

Ky\o ifAy<0' 


where A/[- is chosen such fhaf Tr(p;(fj-+i)) = 1. This 
fruncafion rule avoids expensive opfimizafion fo find 
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# of Observations 


FIG. 5. Diffusive tracking of a coin state (bias). This figure illustrates that the CONDENSATION algorithm can track a deterministic 
process even when "drift" terms in the CONDENSATION algorithm are not updated. Here, we consider a coin with the true coin 
state evolving as a two-tone sinusoidal function = | [2 -|- cos(27T/itj-) -|- cos(27r/2tj-)]< sampled at discrete times fj- = kAt 
where Af is the time between samples. In this figure, the frequencies are fi =1/80 and f2 = \l 294. 


the closest state consistent with a given drift and dif¬ 
fusion update while still generalizing methods 

known to be effective and efficient for coin esfimafion. 

In order fo defermine fhe limifations of sfafe space 
fracking, we considered tracking a single tone cosine 
= cos{2TzfkAt), where fj- = kAt is a discrete 
time and Af is the time between samples. Recall that 
we are choosing not to include deterministic evolution 
(i.e. "drift") in our model, thus the following obser- 
vafion only applies fo purely diffusive fracking. We 
found fhe our algorifhm could hack a frequency up fo 
/track = (1/10) X (1/Af). Af higher frequencies, our ap¬ 
proach failed fo hack the oscillatory behavior of p(f), 
in thaf if would reporf p(4) = 1/2 for all fime. This 
modality can be tested using model selection [12, 37], 
such that more a appropriate algorithm can be used in 
that case. A more sophisticated, though less quantita¬ 
tive, analysis of fhis failure can be found in Appendix E. 

VI. NUMERICAL EXAMPLES 

We now show examples of Bayesian sfafe and process 
tomography using sequenfial Monfe Carlo, wifh priors 
fhaf are respecfively defaulf and insighfful. These exam¬ 
ples were generated using QInfer [39]. 

For sfafe tomography, we demonsfrafe our mefh- 
ods by learning qufrit sfafes, as shown in Figure 6. 
We demonsfrafe fhe performance of our algorifhm in 
fhis case by reporfing fhe risk, defined as fhe expected 
quadrafic toss over repefifions of fhe algorifhm, 

r{p, n) = Ep,^„[|| Ip)) - Ip)) II 2 ]. (23) 

We use each of a defaulf, insighfful and unbiased, and 
an insighfful buf biased prior. In all fhree cases, we draw 


fhe "frue" sfafes p from fhe prior fhaf mafches fhe in¬ 
sighfful and unbiased prior. 

In Figure 6, we also verify fhaf our mefhod is robusf 
for fhe qufrif example by considering an insighfful prior 
whose mean is nearly orfhogonal fo fhe frue sfafe. No- 
fably in well over half of fhe 1200 frials considered in 
fhaf case, QInfer reporfed fhaf fhe algorifhm was likely 
fo fail, heralding fhe impacf of fhe "bad" prior. 

We fhen proceed fo consider sfafe-space quanfum 
sfafe fomography as defailed in Secfion V. We demon¬ 
sfrafe fhe performance of our sfafe-space mefhod in an 
animafion, available online [59], also see Video 1 for a 
snapshof. 

Finally, we demonsfrafe fhe applicafion of our 
mefhod fo learning quanfum channels acfing on a 
qubif. In Figure 7, we show an example of a single 
simulated quanfum process fomography experimenf, 
where fhe frue charmel and fhe insighfful prior (gen- 
erafed wifh a BCSZ fiducial prior) agree. The prepa¬ 
ration and measurement settings are chosen to be el¬ 
ements of fhe Pauli basis. Specifically, 20% of fhe ex- 
perimenfs use random Pauli preparation and measure- 
menfs, while 80% of fhe experimenf use seffings fhaf 
maximize ,Ei\Lp\pJ,Eijj ouf of 50 randomly pro¬ 
posed Pauli preparafions and measuremenfs; fhaf is, 
adapfively chosen fo overlap wifh fhe principal com- 
ponenfs of fhe currenf posterior. The resulting posfe- 
rior disfribufion characterizes fhe uncerfainfy remain¬ 
ing abouf fhe "frue" channel, as shown in Figure 8. In 
particular, we note the principal components of fhe pos¬ 
terior covariance matrix are themselves quantum maps 
which describe the directions of maximal uncerfainfy in 
the final posterior. For fhis example, our error is domi- 
nafed by uncerfainfy abouf fhe confribufion of fhe iden- 
fify and Hadamard charmels, as is made clear by visual 
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- Uninsightful 

if- Insightful (unbiased) 



Number of Experiments 
(20 shots each) 

FIG. 6. Risk of SMC-based tomography for a qutrit, 
using three different priors, with true states drawn from 
the generalized amplitude damping distribution for = 
diag(0.9,0.05,0.05). The default prior is taken to be the 
full-rank qutrit Ginibre distribution, while the insightful 
prior is taken to be identical to the actual distribution 
over true states and the biased prior uses the mean p^, = 
diag(0.87,0.065,0.065). The nearly orthogonal prior is taken 
to be insightful with the mean p^, = diag(0.065,0.065,0.87). 
Risk is measured with respect to the quadratic loss function 
on vectorized density operators, L{p,p) = || |p)) — |p)) || 2 . The 
risk is averaged over 1200 trials. Measurements are chosen to 
be rank-1 projectors onto randomly drawn single-qutrit sta¬ 
bilizer states. Each such stabilizer state is then measured 20 
times. 

inspection in Figure 8 (bottom right). 

VII. CONCLUSIONS 

In this work, we have provided a new prior distri¬ 
bution over quantum states and channels that allows 
for including experimental insight, a software imple¬ 
mentation for numerically approximating Bayesian to¬ 
mography, and a mefhod for fracking fime-dependenf 
sfafes. Togefher, our advances make Bayesian quan- 
fum tomography practical for currenf and fufure exper- 
imenfs. In particular, our mefhods allow for exploif- 


ing well-known benefifs of Bayesian mefhods, including 
credible region esfimafion, hyperparameferizafion and 
model selection. 

We note, however, fhaf our insighfful prior on sfafes 
and channels is completely specified by ifs firsf momenf. 
An inferesfing and open problem would fhus be to de¬ 
velop a prior on sfafes and channels fhaf is complefely 
specified by ifs firsf and second momenfs. 

Finally, wifh respecf to fhe fomographic sfafe fracking 
mefhods presented in Secfion V, if is worfh nofing fhaf 
van Enk and Blume-Kohouf [12] suggested fhaf model 
selection could be used to defermine if a source was 
driffing or diffusing. In fhis manuscripf we have pro¬ 
vided a way fhaf allows one fo hack a source fhaf is 
driffing or diffusing. If is also possible fo combine fhe 
approaches and use model selection fo defermine when 
fracking is necessary or when a sfafic model is sufficienf 
as demonsfrafed by Granade [37]. 

In shorf, wifh consfrucfive mefhods for sampling 
from insighfful priors, and wifh modern sfafisfical 
mefhods, Bayesian sfafe and process fomography are 
made practical for currenf experimenfal needs. This in 
fum allows us fo explore new questions in fomography, 
and fhus beffer characferize and diagnose quanfum in¬ 
formation processing sysfems. 
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distribution with mean 0.006. Each measurement consists of 25 shots along a random Pauli axis. 
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FIG. 7. Example quadratic loss between estimated and true channels for SMC-based tomography for a qubit channel A[p] = 
0.7p + 0.3HpH, where Tf is a Hadamard. The prior for both the SMC estimator and the true channel is taken to be the insightful 
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experiments, 20% are chosen to be random Pauli preparation and measurements, while 80% are chosen adaptively. In particular, 
adaptation proceeds by proposing 50 different random pairs of Pauli preparations and measurements, then selecting the one 
which has the maximum expectation value ((E)) = ((P, M|E|P, M)), where E is the current covariance superoperator. 
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Appendix A: Choi-Jamiolkowski Isomorphism and QPT 


In order to interpret (la) as specifying probabilities for quanfum process fomography [61] as well as sfafe fomog- 
raphy we use fhe Choi-Jamiolkowski isomorphism and represenf a h 5 rpofhesis abouf fhe channel A as a sfafe 


/(A) :=(1®A)(|1))((1|). 


(Al) 


The formalism of using fhe Choi-Jamiolkowski isomorphism [62, 63] fo describe open quanfum processes has also 
been recenfly defailed in defail by Wood et al. [64] using fensor network diagrams wifh parficular regard fo quanfum 
fomography and ifs generalizafions [65,66]. 

In fhe case fhat quanfum process fomography is carried ouf by ancilla-assisfed process fomography [67], we are 
done, as /(A) is, up to normalization, the state used in A APT. On the other hand, we can also use that for any linear 
operafor X, A(X) = Trj [/(A) (1 O X^)] fo represenf preparafion followed by evolufion under A and measuremenf as 
a measuremenf of /(A) /D [68]. In parficular, suppose fhaf a QPT experimenf is performed in which each observafion 
consisfs of a measuremenf effecf E and a preparation p, 


Pr(E'|/(A))=Tr[EA(p)] 

= Tr[(l®E)/(A)(pT0i)] 

= {p\E\]{A))) 

= ((P,E|/(A)/D)), 


(A2a) 

(A2b) 

(A2c) 

(A2d) 


where P := p^ ■ D and E' = P O E. This now has the form of a measurement P O E being made on a state /(A) /D, 
such that (la) completely describes the outcomes of bofh in-place and ancilla-assisfed QPT experimenfs. Though we 
used fhe column-sfacking basis fo derive fhis equivalence, implicifly defining a convention for fhe Choi sfafe, we can 
inserf a unifary operafor on fhe space of vectorized operators fo observe fhaf (A2d) is nof dependenf on fhe choice 
of basis. 

The likelihood after N measuremenfs is 


N 

£(/(A)) = Pr (M|/(A)) = nPr(E'l/(A)), (A3) 

where A4 is fhe sfring of measuremenf resulfs and E'- is fhe observafion of fhe i'fh generalized measuremenf. Nofe 
fhaf (A3) is of fhe same form as (la), such fhaf quanfum process fomography can be freafed as a special case of 
quanfum sfafe fomography, provided fhaf we use an appropriafe fiducial prior. 


Appendix B: Review of Sequential Monte Carlo 

The purpose of fhis appendix is fo provide a minimal summary and poinf fhe inferesfed reader fo fhe relevenf 
liferafure on fhe parficle filfering and sequential Monfe Carlo (SMC) algorifhms as applied fo Bayesian inference. 
There is a wealfh of information in the references fhaf need nof be reproduced here. 

The basic idea of SMC is fo approximafe a disfribufion over some paramefers x conditioned on fhe Jfh dafa dj, i.e. 
Pr(x|dy). The approximation uses a weighted sum of delfa-functions, specifically 

N 

Pr(xldj) w ^ zvt:{dj)S{x - x,t), (Bl) 

k=-[ 

where fhe N objecfs Xj are called particles and w^{dj) is fhe weight of fhe kth parficle. Here (Bl) should be compared fo 
(11) of fhe main texf; where we suppressed many fechnical and nofational defails. As fhe number of parficles N in¬ 
creases fhe qualify of fhe approximafion improves. The initial particles are chosen by sampling the prior distribution 
over X, and are taken to have uniform weighf. 

The nexf step is fo updafe fhe posferior disfribufion given new dafum say fhe j -|- Ifh dafa is dy+i. Then using fhe 
likelihood funcfion fhe parficle weighfs are updated from fhe previous weighfs {u>^{dj)) using 


Wk{dj+\) « Pr(d^+i|xjt)a;t(dy). 


(B2) 


Clearly, fhe parficle weighfs after fhis updafe need fo renormalized such fhaf Wfc(dy+i) = 1. 
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As updates proceed, the concentration of weights on the most plausible particles causes the distribution to be sam¬ 
ples by a much smaller number of effective parficles, as measured by fhe effecfive sample size Uggs = 1/ 

Thus, periodically parficle locafions have fo be perfurbed and fhe weighfs resef fo uniform, restoring numerical 
sfabilify. This is called resampling. A video demonsfrafing fhe Bayesian updafe and resampling steps for a simple 
Hamiltonian learning problem is available online [69]. 

In fhe remainder of fhis appendix we provide references which can be used fo obfained more defails on SMC. 
Doucef et al. have provided a more fhrough review from fhe perspective of classical sfafisfics [28,46]. In fhe quanfum 
domain summaries of fhe SMC algorifhm can be found in references [4, 25, 26, 44, 70, 71]. Finally, Svensson has 
provided a useful video fuforial illusfrafing fhe applicafion of parficle filters in a simplified radar applicafion [72]. 

For our applicafion (quanfum tomography), Huszar and Houlsby [1] and by Ferrie [2] suggested SMC as numer¬ 
ical fool for implementing (5). Of particular inferesf is Ferrie's recent work on tomographic region estimators with 
SMC [25]. Ref. [37] has provided a summary of recenf applicafions in quanfum informafion. 


Appendix C: Sampling Default Priors 


In this Appendix we review the algorithms necessary to sample from defaulf priors, provided by References [48- 
50, 52]. 


Algorithm 1 ZS algorithm [49] for sampling from fhe Ginibre mafrix ensemble. 

Input: Dimension D and K. 

Output: An D x K matrix, drawn from the Ginibre ensemble, 
function GinibreMatrix(D, K) 

Generate an D x D matrix G with elements drawn independently from (N(0,1) -|- i ■ N(0,1)). 

return G 
end function 


Algorithm 2 Mezzardi algorithm [50] for sampling from fhe Haar measure. 
Input: Dimension D. 

Output: A imitary operator U € drawn from the Haar measure, 

function HaarUnitary(D) 

Z ~ GinibreMatrix(D, D) 

Q, R ■(— QR-decomposition of Z 
A <— diag(R) 

All ■< All /1 All I 

return U ■(— QA 
end function 


Algorithm 3 ZS algorithm [49] for sampling sfafes from fhe Ginibre ensemble. 

Input: Dimension D and rank K < D. 

Output: An D x D positive semidefinite matrix of rank K drawn from the Ginibre ensemble, 
function GinibreState(D, K) 

A ~ GlNIBREMATRIX(D,k) 
p ^ AA+/Tr(AA+) 

return p 
end function 
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Algorithm 4 OSZ algorithm [48] for sampling from Bures measure. 

Input: Dimension D. 

Output: An D x D positive semidefinite matrix of rank K drawn from the Bures measure, 
function BuresState(D) 

A ~ GinibreMatrix(D, D) 

U ~ HaarUnitary(D) 

p ^ (1 + U)AA^(1 + Lr+)/Tr[(l + U)AA^(t + U^)] 

return p 
end function 


Algorithm 5 BCSZ algorithm [52] for sampling from a uniform ensemble of CPTP maps. 

Input: Dimension D, Kraus rank K 

Output: A X Choi matrix /(A) of a channel drawn from the BCSZ distribution, 
function BCSZChannel(D) 

X ~ CinibreMatrix(d2,K) 
p ^ XX+ 

Y ■(— Tr 2 p > Tr 2 indicates the partial trace over the second copy of C^. 

Z ^ (1 (g) y-h2)p(]L (g, Y-h2) 

return ZD 
end function 


Appendix D: GADFLI Prior for Coins 

A prior distribution over the bias of a coin is a special case of a qubif and rebif prior. If corresponds fo a prior over 
one axis of fhe Bloch sphere. Wifhouf loss of generalify, we will fake fhaf axis fo be fhe z-axis, such fhaf fhe densify 
operafor for a coin is given by 


p {o i-p) 2^2 2^0 1-2 y 

where z = 2p — 1 is fhe z-axis coordinafe on fhe Bloch sphere at which the coin state is positioned. 


(Dl) 



FIC. 9. Two different priors on coins, each with the same prior mean: = 1/3. This figure should be compared to Figure 1. 

Given the result of a coin foss r G {0,1}, we assign a likelihood for fhaf oufcome conditioned on fhe coin p as 

Fr{r\p) = pY, (D2) 

where fhe probabilify of heads is Pr(0|p) = p. Using fhe densify operafor definition of a coin, we define fhaf fhe 
minimum eigenvalue of a coin is Amin = min(p, 1 — p). 
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FIG. 10. A histogram of the Generalized Amplitude Damping (GAD) prior for the coin, with pp 6 {1/16, 4/16, 15/16}. The 
histogram was generated with 4,000,000 samples from the GAD prior. The Gad prior is symmetric about p^, = 1/2 even though 
Pfi = 1/2 is not an allowed value. 


We draw samples of the coin state p from our GAD prior by first choosing e ~ Beta (a, /3) and p to be sampled from 
a fiducial prior such that ]E[p] = 1/2. Through the rest of this example, we use p ~ Uniform(0,1). The GAD-prior 
sample p' is obtained by transitioning p with a linear function O given by 

p'= ^[p\e,p^] = {1 - e)p + ep*. (D3) 


The expectation over this GAD prior then gives the Bayesian mean estimator before any data is collected. Using that 
p is fiducial (has mean 1/2) and that E[Beta(a, ji)] = a/ (a + fi), we find that 

Pf ■= ^P,eMp\e>P*)] = ■ I + -^^P*' (D4) 

and p^i 7 ^ 1 /2. Inverting, we find that the fixed point p* needed to guarantee that the prior mean is p^ is given by 


a + /I 




cc + ^ 2j' 

which is analogus to equation (19). In order for p* to be a valid coin, this constrains 

P 1 

We find a and ^ consistent with this condition and such that the mean 

r 1 IX- 

Ee e = 


a + ^ 


(D5) 


(D6) 


(D7) 


is minimized. This represents that the GAD channel used to define samples does as little as possible to transform 
fiducial (uniform) samples to our prior. Minimizing subject to the constraints that a > 0, > 0 and p* > 0, we 

obtain that 


OL = 1 and /I 


2p, -1 

'^Pf 

l-2p. 


1 ifp;,>l/2 

iipp < 1/2 


(D8) 


We can write this in terms of Amin to obtain the final GAD-prior condition. 


OL = 1 and {5 


2 Amin 

1 - 2Ami 


(D9) 


which should be compared to (21). Using this condition, we obtain the coin priors shown in Figure 10. 
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Appendix E: An estimate of the tracking frequency 

We now give an order of magnitude estimate for maximum frequency that is "smoothly" trackable by our algo¬ 
rithm. It is possible our algorithm can track higher frequencies but the evolution is likely to be fairly discontinuous. 

Our estimate for the tracking frequency comes from sampling arguments in tracking a bias of a coin, with prob¬ 
ability Pr(0|p) = p for heads. We will assume the coin's bias changes from Pr(0|p) = p to Pr(0|p) = p ± e where 
e 1. The question is how quickly can we detect such a change, i.e. how many equally spaced in time samples does 
it take to notice the difference of ±e. 

We must specify an error tolerance-e-for our sensing protocol and a confidence interval-Z-for our estimate of 
p. It turns out if we use the trace distance between two coins p = diag(p, 1 — p) and a = diag(p ± e, 1 — p ± e) 
then D{p,a) = (l/2)Tr|p — cr\ = e. These two specifications will in turn (approximately) determine the number of 
samples required and therefore the bandwidth of our tracking protocol. We choose Z = 1.9599 which corresponds to 
a 95% level of confidence for our estimator and the maximum acceptable error jptrue ~ PestI < £• Using the standard 
deviation of the Bernouli distribution \/p('L — p)/N, we have e = Zy/p{l — p)/N. The largest standard deviation 
(worst case) is when p = 1/2 this gives e = Z/(2\/N). Rearranging for N gives N = Z^/(4e^). To obtain this 
many samples we must measure for a time Tmeas = AtN, where At is the time between measurements. Thus Tmeas 
is effectively the time between our samples of p(t). Naive arguments from the Nyquist-Shannon sampling theorem 
imply that we can not determine frequency components of p(t) greater than /max = 1/ (2Tmeas)/ which is called the 
is detection frequency bandwidth. The implication is we can track, for example, p(t) = 1/2-\- e sin (/max)- 


Appendix F: QInfer Tomography Tutorial 

In this Appendix, we demonstrate the use of QInfer for Bayesian state and process tomography. In particular, we 
show how to estimate states and channels given data S 5 mthesized from a description of a true state, and discuss how 
to obtain region estimates, covariance superoperators and other useful functions of tomography posteriors. We then 
discuss how to apply these techniques in experimental systems. 

The tomography implementation in QInfer is based on QuTiP and NumPy, so we start by importing everything 
here. 

In [1]: import numpy as np 
import qutip as qt 
import qinfer as qi 

As a first step, we define a basis for performing tomography; the choice of basis is largely arbitrary, but depending 
on the experiment, some bases may be more or less convienent. Here, we focus on the example of the single-qubit 
Pauli basis B = {l,crx,cry,tTz}. 

In [2]: basis = qi.tomography.pauli_basis(1) 
display(basis) 

<TomographyBasis dims=[2] at 362504488> 


We will get a lot of use out of the Pauli basis, so we also define some useful shorthand. 

In [3]: I, X, Y, Z = qt.qeye(2), qt.sigmaxQ, qt.sigmayO, qt.sigmazQ 

Basis objects are responsible for converting between QuTiP's rich Qob j format and the unstructured model param¬ 
eter representation used by Qinfer. 

In [4]: display(basis.state_to_modelparams(I / 2 + X / 2)) 
array([ 0.70710678, 0.70710678, 0. , 0. ]) 

In [5]: display(basis.modelparams_to_state(np.array([1, 0, 0, 1]) / np.sqrt(2))) 
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Quantum object: dims = [[2], [2]], shape = [2, 2], t 5 ^e = oper, isherm = True 

( 1.000 0.0 \ 

0.0 0.0 ) 

Having defined a basis, we then define fhe core objecf describing a tomography experiment, the model. In QInfer, 
models encapsulate the likelihood function, experimental parameters and other useful mefadafa abouf fhe experi- 
menfal properties being estimated. In our case, we use TomographyModel to describe the single-shot experiment, and 
BinomialModel to describe batches of fhe single-shof experimenf. 

In [6]: model = qi.BinomialModelfqi.tomography.TomographyModel(basis)) 
display(model) 

<qinfer.derived_models.BinomialModel at 0x159b6048> 


A Model defines a vector of model paramefers; for a single qubif TomographyModel, fhis is a vecfor of lengfh 4, each 
describing a different element of fhe Hermitian operator basis. Each Model also defines experiment parameters as a 
NumPy record array. A record then describes a single measurement of fhe model. 

In [7]: displayfmodel.expparams_dtype) 

[(’meas’, float, 4), (’n_meas’, ’uint’)] 


In this case, the experiment parameters record has two fields: meas and n_meas. The first is a vector of four floafs 
corresponding to |m| = (((Bo|M)), ((Bj |M)), ((B 2 IM)), ((B 3 IM))). The second is an unsigned integer (uint) describing 
how many times that measurement is performed. For instance, measuring (1 -|- < 7 z) /2 40 times is given by the array: 

In [8]: expparams = np.array([ 

# Each tuple, marked with (), defines a single record. 

( 

# Within each tuple, fields are separated by commas. 

# The fields follow in the order given by the model, 

# so the first field is meas, a length-4 vector. 

[1 / np.sqrt(2), 0, 0, 1 / np.sqrt(2)], 

# The second field is then the number of measurements. 

40 

) 

], 

# We finish building the array by passing along the right data\ 

# type to NumPy. This is somwhat of a QInfer idiom. 
dtype=model.expparams_dtype) 

display(expparams) 

array([([0.7071067811865475, 0.0, 0.0, 0.7071067811865475], 40L)], 
dtype=[(’meas’, ’<f8’, (4,)), (’n_meas’, ’<u4’)]) 

The fields of a record array can be obtained by indexing. For instance, the [meas] field is then a 1 x 4 array, with 
the first index allowing for a sequence of measurements to be described at once. 

In [9]: display(expparams[’meas’]) 

array([[ 0.70710678, 0. ,0. , 0.70710678]]) 


Note that by convention, meas is normalized to 1/ y/d. 

Often, we will not construct experiments directly, but will instead rely on QInfer's heuristics (described below). 
In any case, once we have a model, the next step is to create a prior. QInfer comes with several useful fiducial priors, 
as well as insightful priors constructed from amplitude damping charmels. For instance, to create a Hilbert-Schmidt 
uniform prior constrained to rebits, we use the GinibreReditDistribution: 
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In [10]: fiducial_prior = qi.tomography.GinibreReditDistribution(basis) 


In [11]: qi.tomography.plotting_tools.plot_rebit_prior(fiducial_prior, rebit_axes=[1, 3]) 




- 1.0 


-0.5 


0.0 


0.5 


1.0 


\p} 


Here, we have told QInfer that we wish to treat dx and as our rebit axes using the rebit_axes=[1, 3] argument. 
Insightful priors can be constructed by specifying a fiducial prior and a QuTiP Qob j represenfing fhe desired mean. 


In [12]: prior_mean = (I + (2/3) * Z + (1/3) * X) / 2 
display(prior_mean) 


Quanfum objecf: dims = [[2], [2]], shape = [2, 2], t 5 ^e = oper, isherm = True 


/ 0.833 0.167 \ 
0.167 0.167 ) 


In [13]: 


prior = qi.tomography.GADFLIDistribution(fiducial_prior, prior_mean) 


In [14]: qi.tomography.plotting_tools.plot_rebit_prior(prior, rebit_axes=[1, 3]) 
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Having constructed a prior and a model, we can now continue to perform Bayesian inference using SMC. We 
demonsfrafe using fhe frue sfafe p = 1/2 + (2/3)£7’z/2 wifh fhe prior mean = 1/2 + (4/5)crz + {l/7)ax- 


In [15]: basis = qi.tomography.pauli_basis(1) 

model = qi.BinomialModelfqi.tomography.TomographyModel(basis)) 
true_state = basis.state_to_modelparams( 
I/2+(2/3)*Z/2 
)[np.newaxis, :] 

fiducial_prior = qi.tomography.GinibreReditDistribution(basis) 
prior = qi.tomography.GADFLIDistribution(fiducial_prior, 
I/2+(4/5)*Z/2+(1/7)*X/2 

) 


In [16]: qi.tomography.plotting_tools.plot_rebit_prior(prior, true_state=true_state, rebit_axes=[1, 3]) 
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The updater and heuristic classes track the posterior and the random-measurement experiment design, respec¬ 
tively. 


In [17]: updater = qi.smc.SMCUpdater(model, 2000, prior) 

heuristic = qi.tomography.RandomPauliHeuristic(updater, other_fields={ ’n_meas’ : 40}) 


We s 5 rnthesize data for the true state, then feed if info fhe updafer in order fo obfain our final posferior. 


In [18]: for idx_exp in xrange(50): 

experiment = heuristicf) 

datum = model.simulate_experiment(true_state, experiment) 
updater.update(datum, experiment) 


In [19]: pit.figure(figsize=(10, 10)) 

qi.tomography.plotting_tools.plot_rebit_posterior( 
updater, prior, true_state, 
rebit_axes=[1, 3] 
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We can use our tomography basis object to read out the estimated final state as a QuTiP Qobj. 

In [20]: est_mean = basis.modelparams_to_state(updater.est_mean()) 
display(est_mean) 

Quantum object: dims = [[2], [2]], shape = [2, 2], t 5 ^e = oper, isherm = True 

( 0.814 0.014 \ 

1^ 0.014 0.186 ) 

As discussed in the main text, the posterior can also be described by the covariance superoperator Up = Cov(|p))). 
We demonstrate by showing the Choi matrix /(Cov(|p)))). 

In [21]: cov_superop = basis.covariance_mtx_to_superop(updater.est_covariance_mtx()) 
display(qt.to_choi(cov_superop)) 

display(Latex (r"$\|\Sigma\rho\|_{{\Tr}} = {:0.4f}$" .format(cov_superop.norm(’tr’)))) 

Quantum object: dims = [[[2], [2]], [[2], [2]]], shape = [4,4], type = super, isherm = True, superrep = choi 

/ 2.854 X 10"°^ 1.622 x 10-°5 1.622 x 10-°^ 3.210 x 10-°^ \ 

1.622 X 10-°5 -2.854 x 10"°^ 3.210 x 10-°"^ -1.622 x 10-°5 
1.622 X 10-°5 3.210 X lO-^^^ -2.854 x 10-°^ -1.622 x 10-°5 

V 3.210 X 10"°^ -1.622 X 10"°^ -1.622 x 10-°^ 2.854 x 10-°^ / 


||Zp||Tr = 0.0012 

Here, we use the Hinton diagram plotting functionality provided by QuTiP to depict the covariance in each ob¬ 
servable that we obtain from fhe posferior. 

In [22]: displayfqt.visualization.hinton(cov_superop)) 








(<matplotlib.figure.Figure at 0x18218a20>, 
<matplotlib.axes._subplots.AxesSubplot at 0x18209f98>) 


25 



- 0.00060 

- 0.00045 

- 0.00030 

- 0.00015 

- 0.00000 

- - 0.00015 

- - 0.00030 

- - 0.00045 

- - 0.00060 



